## [1] "Mon Mar 25 10:48:18 2019"
Read file of all studies in AACT.
## [1] "Total studies: 300214 ; unique NCT_IDs: 300214"
Select only Interventional study_type.
## [1] "Interventional studies: 237892 (79.2%)"
| phase | N |
|---|---|
| Early Phase 1 | 2619 |
| Phase 1 | 29795 |
| Phase 1/Phase 2 | 10063 |
| Phase 2 | 41637 |
| Phase 2/Phase 3 | 4963 |
| Phase 3 | 29662 |
| Phase 4 | 25001 |
| NA | 94152 |
Read file of all drugs in AACT. - id is AACT ID. - Note that one study may involve multiple drugs.
## [1] "Unique drug names: 91347 ; unique intervention IDs: 255077"
Select only studies involving drugs.
## [1] "Drug trials: 124421 ; unique NCT_IDs: 124421"
Merge study metadata with drugs.
| phase | N |
|---|---|
| Early Phase 1 | 2615 |
| Phase 1 | 48593 |
| Phase 1/Phase 2 | 13288 |
| Phase 2 | 68850 |
| Phase 2/Phase 3 | 6503 |
| Phase 3 | 49507 |
| Phase 4 | 36331 |
| NA | 29390 |
AACT drug names resolved to standard names and structures via SMILES.
## [1] "Drugs with resolved structure: 180555 / 197300 (91.5%)"
| overall_status | N |
|---|---|
| Completed | 114900 |
| Recruiting | 23262 |
| Terminated | 15384 |
| Unknown status | 15111 |
| Active, not recruiting | 10409 |
| NA | 5675 |
| Not yet recruiting | 5604 |
| Withdrawn | 5475 |
| Enrolling by invitation | 741 |
| Suspended | 739 |
## Warning: Ignoring 1 observations
| phase | N |
|---|---|
| Early Phase 1 | 1916 |
| Phase 1 | 36516 |
| Phase 1/Phase 2 | 9476 |
| Phase 2 | 50770 |
| Phase 2/Phase 3 | 4830 |
| Phase 3 | 38473 |
| Phase 4 | 31452 |
| NA | 23867 |
## [1] "Mentions by intervention ID: 157862 / 171741 (91.9%)"
## [1] "Mentions by study: 92966 / 99647 (93.3%)"
## [1] "Mentions by drug name: 11108 / 58297 (19.1%)"
## [1] "PubChem SMILES2CID hits: 3960 / 4698 (84.3%)"
## [1] "Intervention IDs mapped to PubChem CIDs (via SMILES): 153876"
## [1] "PubChem CIDs with InChIKeys: 3801"
## Warning: 152 parsing failures.
## row col expected actual file
## 1028 biotherapeutic 1/0/T/F/TRUE/FALSE {'biocomponents': [], 'description': 'SUBSTANCE P', 'helm_notation': 'PEPTIDE1{R.P.K.P.Q.Q.F.F.G.L.M.[am]}$$$$', 'molecule_chembl_id': 'CHEMBL235363'} '../data/aact_drugs_inchi2chembl.tsv'
## 1028 helm_notation 1/0/T/F/TRUE/FALSE PEPTIDE1{R.P.K.P.Q.Q.F.F.G.L.M.[am]}$$$$ '../data/aact_drugs_inchi2chembl.tsv'
## 1367 biotherapeutic 1/0/T/F/TRUE/FALSE {'biocomponents': [], 'description': 'TERLIPRESSIN', 'helm_notation': 'PEPTIDE1{G.G.G.C.Y.F.Q.N.C.P.K.G.[am]}$PEPTIDE1,PEPTIDE1,9:R3-4:R3$$$', 'molecule_chembl_id': 'CHEMBL2135460'} '../data/aact_drugs_inchi2chembl.tsv'
## 1367 helm_notation 1/0/T/F/TRUE/FALSE PEPTIDE1{G.G.G.C.Y.F.Q.N.C.P.K.G.[am]}$PEPTIDE1,PEPTIDE1,9:R3-4:R3$$$ '../data/aact_drugs_inchi2chembl.tsv'
## 1389 biotherapeutic 1/0/T/F/TRUE/FALSE {'biocomponents': [], 'description': None, 'helm_notation': 'PEPTIDE1{A.S.T.T.T.N.Y.T}$$$$', 'molecule_chembl_id': 'CHEMBL180971'} '../data/aact_drugs_inchi2chembl.tsv'
## .... .............. .................. ..................................................................................................................................................................................... .....................................
## See problems(...) for more details.
## [1] "ChEMBL compounds mapped via InChIKeys: 3332"
## [1] "ChEMBL activities (with pChembl): 124438"
## [1] "ChEMBL target proteins: 3157"
## [1] "ChEMBL target proteins mapped to TCRD (human): 1806"
## [1] "Organisms: 187"
## [1] " Homo sapiens: 1806"
## [2] " Rattus norvegicus: 529"
## [3] " Mus musculus: 238"
## [4] " Bos taurus: 98"
## [5] " Sus scrofa: 36"
## [6] " Cavia porcellus: 26"
## [7] " Escherichia coli K-12: 19"
## [8] " Oryctolagus cuniculus: 18"
## [9] " Escherichia coli: 17"
## [10] " Mycobacterium tuberculosis: 17"
## [1] " Tbio: 224" " Tchem: 868" " Tdark: 7"
## [4] " Tclin: 707"